Add MambaForSequenceClassification #29552

mjschock · 2024-03-09T01:51:17Z

What does this PR do?

Adds MambaForSequenceClassification for sequence classification with the MambaModel backbone.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ArthurZucker thanks for your work bringing in Mamba! I'm wondering if there's any objection to adding a MambaForSequenceClassification model? I followed the template example as best as I could and happy to continue with adding the new test and finishing it up.

amyeroberts

Hi @mjschook, thanks for opening this PR and adding this!

AFAICT, these changes seem reasonable to me. @ArthurZucker is off for a week - so let's wait for him to come back to confirm if there's any reason for not adding this to Mamba.

A few things that will need to be added:

Tests for the model i.e. equivalent to create_and_check_mamba_model and the model should be added to all_model_classes
The model needs to be documented in mamba.md
All the tests in the CI should be passing

amyeroberts · 2024-03-11T16:32:50Z

src/transformers/models/mamba/modeling_mamba.py

+        x = features[:, 0, :]  # take <s> token (equiv. to [CLS])
+        x = self.dropout(x)
+        x = self.dense(x)
+        x = ACT2FN[self.config.hidden_act](x)


The activation layer should be set in the init and then called here i.e.

def __init__(...): ... self.activation = ACT2FN[config.hidden_act] def forward(...): ... x = self.activation(x)

src/transformers/models/mamba/modeling_mamba.py

mjschock · 2024-03-15T21:08:20Z

src/transformers/models/mamba/modeling_mamba.py

+        self.classifier = MambaClassificationHead(config)
+
+        for param in self.base_model.parameters():
+            param.requires_grad = False


I'm not sure whether we actually want to freeze the params for the base model here, but I found there's a test specific to sequence classification that expects all the unfrozen params to be initialized in the range [0.0, 1.0] and the initialized values for the mixer don't satisfy that assertion.

So... I froze them and made sure the classification head params were initialized to satisfiy the test. It makes intuitive sense to me to freeze them in the case of transfer learning for this task and I did confirm that running LoRA PEFT with target_modules=["x_proj", "embeddings", "in_proj", "out_proj"] and task_type=TaskType.SEQ_CLS does unfreeze the target modules so it appears to work fine, but not sure if we want to force them to be frozen by default.

Anyway, happy to adjust if there's a better practice to follow here.

…prepare_config_and_inputs

…classifier head linear layer weights

ArthurZucker

Hey! We usually do the following before merging such PRs:

Create a feature request issue to add this new class
Wait until the community picks this up
Wait until there are actually pretrained checkpoints released by the community or the authors.

As is, this does not really help anyone as it can be easily implemented by anyone that wants to train a model no?

mjschock · 2024-03-27T00:34:54Z

Thanks for the feedback @ArthurZucker - I'll close it since I went another direction, using prompt tuning instead. I'll keep the process you laid out in mind for the future. =)

scottfleming · 2024-05-07T01:51:18Z

Would be helpful to have this class. Looking forward to #30431

amyeroberts reviewed Mar 11, 2024

View reviewed changes

mjschock changed the title ~~[WIP] Add MambaForSequenceClassification~~ Add MambaForSequenceClassification Mar 15, 2024

mjschock commented Mar 15, 2024

View reviewed changes

ArthurZucker added the Feature request Request for a new feature label Mar 19, 2024

mjschock added 5 commits March 23, 2024 15:08

Add MambaForSequenceClassification

2ed63ce

Update docs and tests for MambaForSequenceClassification

46dec9c

Add hidden_dropout_prob to MambaConfig and pull sequence_labels from …

87d55c4

…prepare_config_and_inputs

Freeze base model params for MambaForSequenceClassification and init …

94f7a1e

…classifier head linear layer weights

Attempt at simplifying and conforming to GPT2/GPTNeoX style

b567c2b

ArthurZucker reviewed Mar 25, 2024

View reviewed changes

mjschock closed this Mar 27, 2024

Adibvafa mentioned this pull request May 31, 2024

Implement MambaForSequenceClassification #31155

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MambaForSequenceClassification #29552

Add MambaForSequenceClassification #29552

mjschock commented Mar 9, 2024 •

edited

Loading

amyeroberts left a comment

amyeroberts Mar 11, 2024

mjschock Mar 15, 2024

ArthurZucker left a comment

mjschock commented Mar 27, 2024

scottfleming commented May 7, 2024

Add MambaForSequenceClassification #29552

Add MambaForSequenceClassification #29552

Conversation

mjschock commented Mar 9, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Mar 11, 2024

Choose a reason for hiding this comment

mjschock Mar 15, 2024

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

mjschock commented Mar 27, 2024

scottfleming commented May 7, 2024

mjschock commented Mar 9, 2024 •

edited

Loading